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Markov processes are used in a wide range of disciplines, including finance. The transition 
densities of these processes are often unknown. However, the conditional characteristic functions 
are more likely to be available, especially for Levy-driven processes. We propose an empirical 
likelihood approach, for both parameter estimation and model specification testing, based on 
the conditional characteristic function for processes with either continuous or discontinuous 
sample paths. Theoretical properties of the empirical likelihood estimator for parameters and 
a smoothed empirical likelihood ratio test for a parametric specification of the process are 
provided. Simulations and empirical case studies are carried out to confirm the effectiveness of 
the proposed estimator and test. 
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1. Introduction 

Let {Xt(0)}teT be a parametric d-dimensional Markov process defined by 

dX t = ii(X t ;6) dt + a(X t ;6) c\L t , e , (1.1) 

where /u(-) is a d-dimcnsional drift function, cr(-) is a d x d matrix- valued function of 
X t , Lf o is a Levy process in R d and 9 G Q C R p . When L t is a standard Brownian 
motion, (1.1) is a diffusion process having a continuous sample path. When L t contains 
the Brownian motion and a compound Poisson process, (1.1) becomes the jump diffu- 
sion process. A stochastic process of form (1.1) has long been used to model stochastic 
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systems arising in physics, biology and other natural sciences. It has also been the funda- 
mental tool in financial modeling. We refer to Sundaresan [35] and Fan [16] for overviews, 
Barndorff-Nielsen, Mikosch and Resnick [5] for recent developments on Levy-driven pro- 
cesses and S0rensen [33] for statistical inference. Important subclasses of (1.1) include (i) 
the multivariate diffusion process defined by 

dX t = t i(X t ;6)dt + a(X t ;0)dB t , (1.2) 

where B t is the standard Brownian motion in R d (Stroock and Varadhan [34] and 
0ksendal [29]); (ii) the Vasicek with Morton Jump model (VSK-MJ) defined by 

dX t = K(a-X t )dt + adB t + J t dN u (1.3) 

where k, a and tr are unknown parameters and represent the mean reverting rate, long- 
run mean and volatility of the process, respectively, N t is a Poisson process with intensity 
A and J t is the random jump size independent of the filtration Tt up to time t and has 
a normal density N(0,n 2 ) (Merton [28]); (iii) Levy driven Ornstein-Uhlenbeck process 
defined by 

dX t = -\X t dt + dL xtl X Q >0, (1.4) 

where L t is a Levy process with no Brownian part, a non-negative drift and a Levy 
measure which is zero on the negative half line, and the parameter A is positive (see 
Barndorff-Nielsen and Shephard [6]). 

Often a closed form expression for the transition density of process (1.1) is not available 
except for some special processes, even if the transition density exists and is unique. This 
fact prevents the use of the maximum likelihood estimation (MLE) and the specification 
tests based on the exact transition density. Recently A'it-Sahalia [2, 3] established ex- 
pansions for the transition densities so that parameter estimation could be based on the 
approximate likelihood functions. Testing may be also formulated via the approximate 
density; see Chen, Gao and Tang [10] and A'it-Sahalia, Fan and Peng [4] for such tests. 
The conditional characteristic functions (CCF) are more likely available than the tran- 
sition densities for the continuous-time models, especially for the Levy-driven processes 
through the celebrated Levy-Khint chine representation. For instance, Duffie, Pan and 
Singleton [15] derived the explicit form of the CCF for multivariate affine jump processes, 
which include the Vasicek with Merton jump process given in (1.3). The CCF for the 
Levy-driven Ornstein-Uhlenbeck process (1.4) is established in Barndorff-Nielsen and 
Shephard [6]. 

Statistical inference based on the characteristic functions was proposed by Feuerverger 
and Mureika [20], Feuerverger and McDunnough [19] for independent observations and 
Feuerverger [18] for discrete time series. Singleton [32] introduced the approach to infer- 
ence for parametric continuous-time Markov processes and showed that estimation can 
be carried out based on the CCF without having to carry out the the Fourier inversion. 
Chacko and Viceira [8] proposed a generalized method of moment estimator (GMM) for 
parameters at a finite number of frequencies of the CCF. Carrasco et al. [7] carried out 
GMM estimation on a slowly diverging number of frequencies of the CCF to achieve 
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the optimal estimation efficiency offered by the MLE. Jiang and Knight [24] proposed 
GMM estimators based on the joint characteristic function of the observed state vari- 
ables. Chen and Hong [9] proposed a test for multivariate processes based on the CCF 
via a generalized spectral density approach. 

In this paper, we first propose an empirical likelihood (Owen [30]) approach for param- 
eter estimation and model specification testing of a parametric Markov process via the 
CCF. An empirical likelihood ratio is formulated for the unknown parameters assuming 
specification (1.1), which leads to a non-parametric maximum likelihood estimator. The 
proposed estimator may be viewed as a compromise between Chacko and Viceira's [8] 
GMM, based on a finite number of frequencies, and that of Carrasco et al. [7], of a 
high-dimensional GMM. The high-dimensional GMM approach requires ridging a high- 
dimensional weighting matrix in order to avoid its singularity, and the selecting the ridg- 
ing parameter can be computationally expensive. The proposed estimation utilizes a wide 
range of frequency information in the parametric CCF, while having the computation 
easily managed. 

We then formulate an empirical likelihood CCF-based model specification test for the 
parametric process (1.1) via kernel smoothing. The proposed test extends the transition 
density based tests of Qin and Lawless [31], Chen, Gao and Tang [10] and Ait-Sahalia, 
Fan and Peng [4] to the CCF based. This largely increases the range of the continuous- 
time Markov processes which can be tested directly without replying on the transition 
density approximation. The proposed test provides an alternative formulation of the 
CCF-based test of Chen and Hong [9] , which is based on an explicit Li measure between 
an kernel estimator of the CCF and its parametric counter-part. It is largely distinct 
from the above mentioned tests, except Chen and Hong [9], by targeting directly on 
CCF, which is more readily available for continuous-time models than the transition 
density functions. Another advantage of the proposed test is the empirical likelihood (EL) 
formulation, which can produce an integrated likelihood ratio test in a nonparametric 
setting. The proposed test utilizes some of the attractive properties of the EL, like internal 
studentizing without an explicit variance estimation and good power performance. How 
to extend the proposed methods to the case of latent variables is quite challenging and 
will be a part of our future research. 

The paper is organized as follows. In Section 2, we introduce and evaluate the CCF- 
based empirical likelihood estimator. The model specification test is given in Section 3. 
Section 4 reports results from simulation studies. An empirical study for a set of 3-month 
treasury bill rate data is analyzed in Section 5. All technical details arc reported in the 
Appendix. 

2. Parameter estimation 

Let {X t s}™ = i be n discretely sampled observations of (1.1). For notation simplification, 
we denote X t s as X t , where the sampling interval 6 is any fixed quantity. Let ij)t(u\6) = 
Eg(e m Xt+1 \X t ), for u G R d , be the conditional characteristic function. We use a and 
A* to denote the conjugate of a complex number a and the conjugate transpose of the 
complex matrix A, respectively. 
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Let £t(r;0) = w(u,r;X t ){e iuTXt + 1 -ip t (u;6)} for r = (u T ,r T ) T e i? 2d , where r; X t ) 
is a weight factor. Here Et(r;9) can be regarded as "residuals" between e m t+1 and the 
parametric CCF tj; t (u;0). The complex weight factor w(u,r; Xt) satisfies w(u,r] X t ) = 
w(—u,—r;X t ) and \w(u,r;X t )\ = 1 for any ii,r£ R d , whose use is aimed to utilize more 
model information. Let 9q be the true parameter and the unique solution of 

E { e iu T x t+1 -^ t {u-6)\X t } = for all utR d . (2.1) 

From the Markov property and (2.1), for any r = (u T , r T ) T G R 2d , 

E{e t (T;6 )} = and Cov{e tl (r; # ), e t2 (r; O )} = if ti^ta- (2-2) 

Let (r;0) and e((r;0) be the real and imaginary parts of e t (r;0) respectively, and 
et(r;0) = (ef (t; 0), (r; #)) T be the real bivariate vector corresponding to £ t (r;0). 

We now formulate an empirical likelihood for 6* based on the CCF ip t (u;6). The em- 
pirical likelihood (EL) introduced in Owen [30] is a technique that allows construction 
of a non-parametric likelihood for parameters of interest. Despite that the EL method is 
intrinsically non-parametric, it possesses two important properties of a parametric likeli- 
hood, the Wilks theorem and the Bartlett correction; see Chen and Van Keilegom [13] for 
a latest overview and Kitamura, Tripathi and Ahn [27] for a formulation with conditional 
moments. 

Let pi(r), . . . ,p„(t) be probability weights allocated to the "residuals" {£*t(r;0)}™ =1 . 
A local EL for 9 at r is 

n 

i n (r,0)=maxjjp t (r), (2.3) 

t=i 

subject to J2t=iPt( T ) = 1 an d J2t=iPt( T )^t{T]9) = 0. Here the second constraint reflects 
(2.1). The maximum empirical likelihood is attained at pt(r) = n~ x for all t such that 
the maximum likelihood L„ (t; 9) = n~ n . Let t n {T] 6) = — 2 log{ -L„(t; 6)/n~ n } be the local 
log- EL ratio of 9 at r. 

Employing the EL algorithm (Owen [30]), the optimal pt(r) of the above optimization 
problem (2.3) is 

( )- 1 1 
Pt[T> nl + \(T;9) T e t (T;9)' 

where A(t; 9) is a Lagrange multiplier in R 2 that satisfies 

Q ln (r; 0, A) =: - £ -—p^L = 0. (2.4) 
l + A(T;(9) J £ t (T;6>) 

Hence, the local EL ratio becomes 

£ n (r; 0) = 2^ log{l + A(r; 0) T e t (r; 9)}. (2.5) 
t=i 
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Integrating £ n (T;9) against a probability weight 7t(t), which is supported on a compact 
set S in R 2d , an integrated empirical likelihood ratio for 9 is 

i n {0)= I £ n (T;6)ir( T )dT. (2.6) 

JreR 2d 

The maximum EL estimator (MELE) for 9 is defined as 

9 n = arg min£ n (6), 

9 

by noting that —2 has been multiplied in the EL ratio £ n (r;9). 

Like Qin and Lawless [31], wc first show that there exists a consistent estimator 9 n 
with a certain rate of convergence as follows. 

Lemma 1. Under Conditions C1-C4 given in the Appendix, with probability one, £ n {9) 
attains its minimum at 9 n in the interior of the ball \\9 — 9q\\ < 0(n -1 / 3 ), and 9„ and 
\{r;9 n ) satisfy 



(2.7) 



Qin(j; 9 n , A(t; 9 n )) = for all r <G S and 
Qin{T]O n , A(r;0„))7r(T)dr = 0, 

where Q\ n is defined in (2.4) and 

Q 2 „(r; 6, A) = I £ — - 1 ^il^ ' ( 2 ' 8 ) 

n 1 + A(r;0) J £ t {T;9) 39 

Before deriving the asymptotic normality of the 9 n , we define 

M * = \(i-l _--l)> £t(r;0) = (e t (r;0), et (-r;0)) T , 

A(ri ) r 2 ;e ^) = Cov{e ; 1 (Ti;e),ei(r2;e)}, 

W =:/E (^1) A- ( T> T; o) £ (^M) dr (2.9) 

and 

V(0 ) = J J E { d£U Q ,e ° ) ) A ^ (n , n ; 0o , go) A(n , t 2 ; e , go) 

x A*' 1 (t 2 , r 2 ; 0o , )# ( — ( ^ o) ) 7r(n )tt(t 2 ) dn dr 2 . 

Theorem 1. Under Conditions C1-C4 given in the Appendix, for the estimator 9 n in 
Lemma 1, we have y/n(6 n - 9 ) 4 JV(0, E) where Z = r- 1 (9 a )V(9 a )r- 1 (9 Q ). 
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The proposed estimator attains the -y/n-rate of convergence. It is computationally 
stable because computing l n (r\ff) for one r at a time is essentially one-dimensional 
problem. Note that Carrasco et al. [7] considered CCF-based generalized method of 
moment estimation by considering a continuum of rs in a functional space via covariancc 
operator, but the covariance operator may not be invertible due to zero eigenvalues. 
Hence, Carrasco et al. [7] needed ridging to avoid the invertible issue, which makes the 
computation quite involved. 

3. Test for model specification 

In this section we consider testing for the validity of (1.1) via testing for the parametric 
specification of the CCF ijj t (u;8). Tests for model specification of a continuous-time 
Markov process have been proposed by Chen, Gao and Tang [10] and A'it-Sahalia, Fan 
and Peng [4] . Despite the fact that parameter estimation based on the transition density 
is asymptotically efficient, it is unclear if a test based on the transition density is more 
powerful than one based on the CCF. The choice is clearer when the transition density 
does not admit a closed form while the CCF does, since the latter is a test valid at any 
level of the sampling interval S. 

Let the underlying process that generates the observed sample path {X t }™ =1 be 

dX t = n(X t )dt + a(X t )dL u (3.1) 

whose CCF is tfj(u;X t ). The process (1.1) is a parametric specification of (3.1). To em- 
phasize the dependence of the CCF on X t , we write in this section ipt(u) as ip(u,X t ), 
ip t (u;9) as ip(u,X t ;9) and other quantities in a similar fashion. We consider testing 

H : P{ip t (u) = ipt(u\ Q )} = 1 for all u e R d and some 9q & Q, 

against a sequence of local alternative hypotheses 

Hi: P{Mu) = Mu;0 o ) + c n A n (u;X t )} = l ioia\\ueR d , 

where {c„} is a sequence of non-random real constants converging to zero at a certain 
rate, and {A„(m; X t )} is a sequence of bounded complex functions which are continuous 
at u = and A„(0; X t ) = 0; see Condition C6 in the Appendix for extra restrictions. 

Since the target of inference is a conditional quantity, we need to work with a kernel 
smoothed version of £ n (9) . Let K be a kernel function which is a symmetric probability 
density in R d , and h be a smoothing bandwidth that tends to as n — > oo. A smoothed 
version of L n (r, 9) is 

n 

L nh (T,x;9) = maxJJp t (r,o;), (3.2) 

t=i 



subject to J2t=iPt( T , x ) = 1 and Ylt = iPt{ T ,x)Kh(x - X t )e(T,X t ]9) = 0. 
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Let £ n h(T,x,6) = —2log{L n h(T,x,9)n n } be the log-EL ratio. Then the integrated log- 
EL ratio for 9 is 

£nh(0) = J J inh(r,x,9)TTi(T)TT 2 (x) dr dx, 

where ~k\ and 7T2 are probability weight functions on the frequency space and the state 
space, respectively. We can choose ttx to be the same as the it in Section 3. 

The test statistic is ( n h{0 n ), where 9 n is the empirical likelihood estimator pro- 
posed in Section 3. As a matter of fact, we can employ any estimator with n 1 ' 2 -rate 
of convergence. To appreciate the meaning of the test statistic, let Wh(x — X t ) = 
Kh(x — X t )/Y^j = \Kh{x — Xj) be the Nadaraya- Watson kernel weight, e n ^{r,x; 6) — 
Y^t=i Kh( x — X t )e(T, X t ; 9) be the kernel smooth of the residuals, e n ,h(i~,x;9) = 
{£ n h(T, X] 9) , e n h(—T, x; 9)) T and R(K) = J K 2 (t)dt. It can be shown by a similar deriva- 
tion in Chen, Ffardle and Li [11] that 

£ nh {6)=nh d R- 1 {K) f fi* !h (T,x;6)V-\T,x;6 ,9)s nth (T,x;6) 

J J (3.3) 
x n 1 {T)f(x)TT 2 (x)dTdx + O p {(nh d )-^ 2 \og 3 (n) + h 2 \og 2 (n)}, 

where V(t, x;9q,9) = Var{e(r, X t ; 9)\X t =x}, and f(x) is the density of X t . So, the 
test statistic is asymptoticly equivalent to a L 2 -measure of the averaged "residuals" 

h (r, x\ 9), inversely weighted by the covariance matrix function V. Hence the proposed 
test is similar in tunc to Fan and Zhang [17] for testing diffusion processes, and of Hardle 
and Mammen [23] and Wang and Van Keilegom [37] for testing regression functions. 

We need the following notations to describe the power property. Let V(ti,T2,x) = 
E{e(Ti,X t ]9o)e*(T2,X t ;9())\Xt — x}, then V(t, t, x; 9q, 9 ) — V(t, x), defined earlier. Ex- 
press the matrices 

V(t!,T2,x) = {Vik(n,T 2 ,x))i<i tk <2 and y _1 (r,a;) = (^' fe (r, x))i<i, k <2- 
Furthermore, we choose c„ = n~ x l 2 h d ^ and define 

T] n (T,X t ) = w(T;X t )A n (u,X t ), fj„(T,X t ) = (?7„(T,X t ),7/„(-T,X t )) T , 

Mn= / / V*(T,x)V~ 1 (T,x;9o,9 )fj n (T,x)Tr 1 (T)Tr2(x)f(x)dTdx 7 



cr 2 = 2R- 2 (K)h- d ~/ 2 (K,V,Tr 1 ,ir 2 ) where 

7 2 (X,V> 1 ,7T 2 ) 

2 

= K^(0) JJJ J2 V lll2 {-T U T2,x)V klk2 {T U -T2,x)u h '^{T U x) (3.4) 



x iy l2M (r2,x)7r 1 (T 1 )7r 1 (r 2 )7Tl(x) dri dr 2 dx, 
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where is the 4th convolution of the kernel function K. 

The asymptotic normality of i n h(^n) is given in the following theorem. 



Theorem 2. Under Conditions C1-C6 given in the Appendix, 



h- d/2 (£ nh (9 n ) - 2 - h d / 2 fi n ) A N(0,2R- 2 (K)~/ 2 (K, V,7n,7r 2 )). 



(3.5) 



We note that \x n = 2 under H$. Under Hi, since A n (u,x) is non- vanishing with respect 
to u, fj n (T,x) is non- vanishing with respect to u for all x in the support of /, which leads 
to a positive quantity fi n , due to V -1 ^, x;0(jj0o) being a Hcrmitian matrix. Since no 
restriction has been imposed on the functional form of A n (u,X t ), it means that the test 
is powerful for a wide range of local alternatives. Indeed, if -y 2 (K, V, tti,^) is a consistent 
estimator of 7 2 (/iT, V,ni,TT2), the asymptotic normality-based test for Hq with a- level of 
significance rejects Hq if 



where zi_ a is the 1 — a quantile of the standard normal distribution. Theorem 2 implies 
that the power of the test under Hi is 



where is the standard normal distribution function. 

It is known that the choice of bandwidth is important in any test based on the kernel 
smoothing technique. To make the test less sensitive to the choice of smoothing band- 
width, we propose carrying out the test based on a set of bandwidths, say {hi, . . . , hk}, 
for a fixed integer k such that hi = Cih for some constants c\ < ci < ■ ■ ■ < c^. Here h is a 
reference bandwidth which may be obtained via the cross-validation method. 

This means that we have a set of the EL ratios {£ n h! (6 n ),---, (-nh k (6n)} corresponding 
to the bandwidth set, and the overall test statistic is 



— a ^1^2), 




T n = max {ft. d/2 (W#»)-2)}. 

Ki<k 



(3.6) 



To describe the asymptotic distribution of T n , let K^ 2 \z, c) = J K(u)K(z + cu) du be 
a generalization to the convolution of K, v(t) = J {K^ (tu,t)} 2 du and 




Theorem 3. Under Conditions C1-C6, T n -> maxi<t<j 2^ as n— >oo, where 



{Zi,...,Zj) T ~N(0,Zj). 
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Let t a be the 1 — a quantile of T n , where a S (0, 1) is the nominal size of the test. The 
following parametric bootstrap procedure is employed to approximate t a : 

Step 1: Simulate a sample path {X t *}™ =1 at the same frequency S according to the 
model under H with the CCF based estimate 6 n . 

Step 2: Let B* n be the estimate of 9 under Hq using the resamplc path {X t *}" =1 obtained 
in Step 1, and T* be the version of T n for the resampled path. 

Step 3: For a large positive integer B, repeat Steps 1 and 2 B times and obtain, after 
ranking, T„ < T„ <•••<-/„ 

Then, the Monte Carlo approximation of £ a is X^ 5 *" 1 xhe proposed test rejects 

Hq if T n {9 n ) > T^- 8 ^ 1- ")]" 1 " 1 )* , xhe justification of the above bootstrap procedure can be 
made based on Theorem 3 via the standard techniques for instance those given in Chen, 
Gao and Tang [10]. 



4. Simulation study 

We report in this section the results from our simulation studies which are designed to 
verify the proposed parameter estimator and model testing procedure. To evaluate the 
quality of the proposed EL estimator, we first chose two univariate diffusion processes 
with known transition densities, so that the MLEs can be compared with the proposed 
EL estimates. The two processes are the Vasicek model (Vasicck [36]) (VSK), 

dX t = K{a-X t )dt + <7dB t , (4.1) 

and the Cox-Ingersoll-Ross model (Cox, Ingersoll and Ross [14]) (CIR), 

AX t = n{a-X t )dt + aiJX~ t &B tl (4.2) 

where k, a and a are unknown parameters which represent the mean reverting rate, 
long-run mean and volatility of the process, respectively. Both processes are widely 
used in interest rate modeling and various option price formulation. For the Vasicek 
model, the transition distribution of X t +i\X t is a normal distribution N(a + (X t — 
a)exp(-K(5),cr 2 (l - exp(-2«£))/(2«)). For the CIR model, when 2na/a 2 > 1, X t+1 \X t 
is a multiple of a non-central Chi-square random variable with degrees of freedom 
Ana/a 2 and non-centrality parameter cX t exp(— kS), where the multiplier is 1/c with 
c = 4ft/(er 2 (l — exp(— kS))). The CCFs of these two models can easily be derived from 
their known transitional densities. 

We then considered estimation for the jump diffusion model VSK-MJ as given in (1.3) 
based on its CCF function 

V> t (u; 0) = cxp j ^-(e~ 2KS - 1) - A* + 7 + i(ou(l - e~ KS ) + ue- K5 X t )^, (4.3) 

where 7 = A/(2/c) J^-2^s exp(— rj 2 u 2 y /2)/y dy. For comparison, we approximated its tran- 
sition density by a mixture of normal distributions, (1 — X5)N(fj 1 g, cr 2 ) + \5N(ii$,<t 2 +^ 2 ), 
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which is a first order approximation proposed in A'it-Sahalia, Fan and Peng [4]. Here, 
Us = ol + (X t — a) exp(— kS), and a 2 = a 2 (l — exp(— 2kS))/(2k). The approximate MLEs 
were obtained based on the mixture approximation given above. 

We also consider the Inverse Gaussian OU process (IG-OU) in (1.4), that is, the process 
X t follows the inverse Gaussian law IG(a,6), for every t when Xq is generated from 
IG(a,&). The CCF of this process is 

i) t (u; 9) = cxp{-a( \/ -2m + b 2 - y/-2iue~ xs + b 2 ) + iae" M X f }. (4.4) 

Since neither the exact transition density nor its approximation is available, we were 
content with carrying out estimation with the proposed methods. 

The last simulation model considered for the estimation is a bivariate extension of the 
univariate Ornstein-Uhlenbeck process (BI-OU), 

&X t = K{a.-X t )dt + (T<lB tl (4.5) 

where X t = (X lt ,X 2t ), k=( Kl1 ),a = ( ai ) and a = ). Under the condition 

that the eigenvalues of the matrix k have positive real parts, the process is stationary 
with transition distribution being a bivariate normal N(m(S, Xt),fl(5)), where m(S,X t ) = 
a + exp(— n5)(X t — a), £l(S) = £ — cxp(— kS)Y, cxp(— k t S) and 

e = 2tr(K) 1 De t( K ) {Det(K)ggT +{k - ^™ T {- - 

The CCF of the process is known to be ipt(ui,U2',6) = exp{iu T m(S,X t ) — u T Q,(8)u/2] 
for u — (u\, U2) 1 '. 

We then carried out simulations to evaluate the ability of the proposed tests in detect- 
ing model deviations. When we chose the simulation models, we had in mind two issues 
in finance that have drawn considerable research attention recently. The first issue is 
whether the process is subject to jumps, and the second is whether we could differentiate 
two processes with different jump rates. Our simulation study formulated two settings of 
hypotheses to address these two issues. In the first setting, we tested 

Hq: The process is the VSK model. 

In the second setting, we tested 

H^: The process is the jump diffusion model VSK-MJ. 

For computing the powers, in the first setting we used the data simulated from H\\ 
the jump diffusion model VSK-MJ to test the null model which does not have jumps; 
in the second setting, we used the data simulated from H\\ the inverse Gaussian OU 
model which has infinite-activity jumps to test the null hypothesis that prescribes a 
finite-activity jump process. 
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For each model, we simulated 500 sample paths which were observed at monthly ob- 
servations (S = 1/12) for n = 125, 250, 500, respectively. The choices of parameter values 
were motivated by Chen, Gao and Tang [10] and Ait-Sahalia, Fan and Peng [4]. 

In parameter estimation, we discovered that for both real and imaginary parts of the 
CCF, their nonparametric smoothing estimators are wave-like functions and roughly 
diminish to zero at the same points, which creates a region denoted as St (here the 
subscript t indicates that the region depends on X t ). In practice, we searched on a couple 
of grid points in the data range of X t and picked the union of St as the support region 
S for the frequency domain of ipt(u]Q) in the estimation. We then chose the uniform 
density as the weight function 7r over the support region. 

In model testing, a similar effort was initially made to obtain the support region of the 
nonparametric CCF estimate, denoted as S^p, and the support region of the theoretical 
CCF under Hq, denoted as Sh - Here the theoretical CCF under Hq used 6 n from our 
EL method. Then the support region of the frequency domain in testing was taken as 
the union of Snp and Sh - We chose the uniform density as the weight function over 
this support region for testing. There is little contribution to the integrated empirical 
likelihood ratio l n h{S n ) from outside the support region. The biweight Kernel K(u) = 
15/16(1 — m 2 ) 2 /(|u| < 1) was used for smoothing in testing. The bandwidth selection is 
described in Section 3. The bandwidth sets were specified in Tables 3 and 4 for the two 
test settings. It is observed that the values of the bandwidths were quite small, which 
was due to the rapid oscillation of the CCF curves which favored smaller bandwidth in 
the curve fitting. 

We chose w(u,r;X t ) = e lr Xt throughout our simulation study as it is the optimal 
instrument suggested in Carrasco et al. [7]. Some numerical exploration (not reported) 
indicated the choice of the function w(-) is not crucial in the context of the paper. For 
testing, we picked the unit instrument to reduce computing burden. 

Table 1 reports the empirical averages of the parameter estimates and their standard 
errors as well as the true parameter values used for simulation. When the sample size 
increases, standard errors of all the proposed estimates decrease, indicating the consis- 
tency of the estimators. We observe from Table l(a)-(b) for the VSK and CIR models 
where the MLEs are available, the proposed EL estimates are quite close to the MLEs. 
Although the EL estimates tend to have larger standard errors than the MLEs, we do 
note that under the VSK model in Table 1(a), the bias of EL estimates for the mean 
reverting parameter k are smaller than the corresponding MLEs for all n = 125, n = 250 
and n = 500. For the jump diffusion model VSK-MJ (Table 1(c)), we sec the EL estimates 
are consistently more efficient than the approximate MLEs in the estimation of k and 
the Poisson intensity A. For the Inverse Gaussian OU model, which does not have the 
MLE to compare with, the proposed estimates as reported in Table 1(d) are close to the 
true values, and the standard errors converge as the sample size increases. 

Table 2 reports the estimates for the bivariate OU process and shows that the EL 
estimates are close to the corresponding MLEs, providing the further evidence of the 
effectiveness of our EL estimator for multivariate process estimation. We also found that 
the EL estimates for the long run mean a± and the volatility an of the first process have 
smaller biases and standard errors than the MLEs for all n = 125, n = 250 and n = 500. 
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Table 1. Empirical averages and their standard errors (in parentheses) of the maximum (MLE) 
or approximate maximum (AMLE) likelihood estimates and the proposed empirical likelihood 
estimates (EL) under the four univariate models 



(a) Vasicek model 







n 




K= 0.858 


a = 0.089 


cr = 0.047 








125 


MLE 
EL 


1.383 (0.603) 
1.305 (0.643) 


0.090 (0.015) 
0.090 (0.017) 


0.047 (0.003) 
0.046 (0.004) 








250 


MLE 
EL 


1.118 (0.397) 
1.052 (0.410) 


0.090 (0.011) 
0.089 (0.013) 


0.047 (0.002) 
0.046 (0.002) 








500 


MLE 
EL 


0.966 (0.240) 
0.951 (0.273) 


0.089 (0.008) 
0.089 (0.009) 


0.047 (0.002) 
0.047 (0.002) 












(b) CIR model 










n 




k = 0.892 


a = 0.091 


cr = 0.181 








125 


MLE 
EL 


1.372 (0.644) 
1.290 (0.719) 


0.091 (0.019) 
0.093 (0.023) 


0.183 (0.012) 
0.178 (0.014) 








250 


MLE 
EL 


1.127 (0.374) 
1.089 (0.435) 


0.090 (0.013) 
0.091 (0.015) 


0.182 (0.008) 
0.179 (0.009) 








500 


MLE 
EL 


1.000 (0.245) 
0.977 (0.290) 


0.091 (0.010) 
0.092 (0.011) 


0.182 (0.006) 
0.180 (0.007) 












(c) Jump diffusion VSK-MJ model 




n 




K — 


0.858 


a = 0.089 


a = 0.047 


A = 2.0 


77 = 0.067 


125 


AMLE 
EL 


1.056 (0.381) 
1.090 (0.261) 


0.093 (0.020) 
0.084 (0.031) 


0.046 (0.005) 
0.048 (0.009) 


1.770 (0.723) 
1.851 (0.323) 


0.060 (0.016) 
0.066 (0.020) 


250 


AMLE 
EL 


0.977 (0.226) 
1.043 (0.201) 


0.093 (0.013) 
0.090 (0.023) 


0.047 (0.003) 
0.048 (0.007) 


1.659 (0.466) 
1.825 (0.236) 


0.059 (0.010) 
0.068 (0.015) 


500 


AMLE 
EL 


0.939 (0.145) 
1.018 (0.115) 


0.092 (0.009) 
0.089 (0.018) 


0.047 (0.002) 
0.049 (0.005) 


1.620 (0.311) 
1.801 (0.163) 


0.060 (0.007) 
0.068 (0.012) 










(d) Inverse Gaussian OU model 








ii 




A = 10.0 


a = 1.0 


6 = 20.0 








125 
250 
500 


EL 
EL 
EL 


10.328 (3.665) 
11.154 (1.976) 
11.489 (1.652) 


1.048 (0.106) 
1.059 (0.043) 
1.031 (0.024) 


20.722 (2.146) 
21.380 (0.878) 
20.846 (0.461) 
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Table 2. Empirical averages and their standard errors (in parentheses) of the maximum (MLE) 
likelihood estimates and the proposed empirical likelihood estimates (EL) under the Bivariate 
OU model 



n 




nii = 0.22 


K21 =0.2 


H22 = 0.5 


125 


MLE 


0.441 (0.197) 


0.395 (0.270) 


0.607 (0.176) 




EL 


0.381 (0.208) 


0.525 (0.238) 


0.594 (0.192) 


250 


MLE 


0.353 (0.165) 


0.307 (0.148) 


0.563 (0.110) 




EL 


0.354 (0.178) 


0.449 (0.184) 


0.564 (0.153) 


500 


MLE 


0.280 (0.118) 


0.241 (0.104) 


0.526 (0.068) 




EL 


0.261 (0.168) 


0.383 (0.154) 


0.487 (0.112) 





n 


qi =0.08 


Q2 = 0.09 


an =0.09 


CT22 =0.17 



125 


MLE 


0.145 (0.166) 


0.099 (0.056) 


0.167 (0.067) 


0.080 (0.079) 




EL 


0.141 (0.141) 


0.117 (0.085) 


0.129 (0.044) 


0.071 (0.034) 


250 


MLE 


0.141 (0.151) 


0.096 (0.036) 


0.140 (0.065) 


0.116 (0.074) 




EL 


0.142 (0.129) 


0.094 (0.073) 


0.095 (0.033) 


0.094 (0.028) 


500 


MLE 


0.102 (0.120) 


0.092 (0.023) 


0.115 (0.051) 


0.146 (0.055) 




EL 


0.099 (0.108) 


0.104 (0.064) 


0.077 (0.024) 


0.105 (0.028) 



Table 3. H : VSK versus Hv the jump diffusion model VSK-MJ 









(a) Size evaluation (in percenta^ 


;e) 






n = 


125 


Bandwidth 


0.012 


0.017 


0.021 


0.025 


0.030 


Overall 






Size 


4.6 


5.6 


5.4 


5.8 


5.6 


4.8 


n = 


250 


Bandwidth 


0.012 


0.015 


0.018 


0.021 


0.024 


Overall 






Size 


5.6 


6.2 


6.2 


6.0 


5.8 


5.4 


n = 


500 


Bandwidth 


0.011 


0.013 


0.015 


0.018 


0.020 


Overall 






Size 


5.0 


5.6 


5.6 


5.4 


5.6 


5.0 


(b) Power evaluation (in percentage) 


71 = 


125 


Bandwidth 


0.016 


0.021 


0.026 


0.032 


0.037 


Overall 






Power 


72.0 


71.6 


70.4 


69.2 


65.8 


72.2 


n = 


250 


Bandwidth 


0.016 


0.019 


0.022 


0.026 


0.029 


Overall 






Power 


82.4 


82.4 


82.2 


82.4 


82.2 


82.6 


n = 


500 


Bandwidth 


0.014 


0.017 


0.019 


0.021 


0.024 


Overall 






Power 


95.0 


94.8 


94.6 


94.4 


94.2 


94.8 
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Table 4. Hq: the jump diffusion model VSK-MJ versus H\: the inverse Gaussian OU model 

(a) Size evaluation (in percentage) 



n = 


125 


Bandwidth 


0.017 


0.022 


0.028 


0.034 


0.040 


Overall 






Size 


3.4 


3.6 


4.0 


3.6 


4.6 


4.6 


n = 


250 


Bandwidth 


0.017 


0.021 


0.024 


0.028 


0.032 


Overall 






Size 


4.6 


4.6 


4.6 


4.6 


5.0 


4.8 


71 = 


500 


Bandwidth 


0.016 


0.019 


0.021 


0.024 


0.026 


Overall 






Size 


5.0 


5.2 


5.2 


5.0 


5.0 


5.0 








(b) Power evaluation (in percental 


re) 






n — 


125 


Bandwidth 


0.008 


0.012 


0.017 


0.021 


0.026 


Overall 






Power 


71.6 


73.8 


73.2 


71.4 


71.2 


74.4 


n — 


250 


Bandwidth 


0.008 


0.011 


0.014 


0.017 


0.020 


Overall 






Power 


84.0 


84.2 


83.4 


81.8 


81.4 


84.4 


n — 


500 


Bandwidth 


0.008 


0.010 


0.012 


0.014 


0.016 


Overall 






Power 


90.1 


88.9 


89.5 


85.1 


85.4 


90.2 



Tables 3 and 4 report the empirical size and power of the proposed test based on 
B = 250 bootstrap resampled paths for each simulation. They contain the sizes and 
powers for the overall test that is based on the five bandwidth set, and for the tests that 
only use one bandwidth. We observe that the tests gave satisfactory sizes under both 
testing settings. In the first test where we used the data from the jump diffusion model 
VSK-MJ to test the continuous diffusion model VSK, the powers range from 65% to 95% 
across the different sample sizes and bandwidths. In the second test where we used data 
simulated from the infinity-activity jump process (the inverse Gaussian OU) to test the 
finite-activity jump process (the jump diffusion VSK-MJ), the powers range from 71% 
to 90% across the different sample sizes and bandwidth choices. 

We also compared our methods with Carrasco et al. [7] for estimation, and with Chen, 
Gao and Tang [10] for testing. To save space, we reported the results in details in the 
supplemental article (Chen, Peng and Yu [12]). 

5. A case study 

In this section, we examine empirically the capability of our testing procedure in detecting 
jumps using the secondary market quotes of the 3-month Treasury Bill (T-bill) between 
January 1, 1965 and February 2, 1999. This bill was sampled at monthly frequency, and 
in total we had 410 observations. The mean of these bills is 0.065, the volatility is 0.026, 
the mean of the differences is very close to zero (1.5 x 10~ 5 ) and the standard deviation 
of the differences is 0.005. The sample period contains some large movements that turn 
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Table 5. Empirical estimation for the 3-month T-bill Data 








(a) 


VSK model 








K 


a 


a 




MLE 


0.277 


0.065 


0.019 






(0.1800) 


(0.0117) 


(C\ C\C\(Y7\ 




EL 


0.274 


0.059 


0.018 






(0.1956) 


(0.0136) 


(0.0007) 






(b) CIR model 








K 


a 


a 




MLE 


0.182 


0.066 


0.061 






(0.1697) 


(0.0179) 






EL 


0.182 


0.064 


0.057 






(0.1934) 


(0.0374) 


(0.0021) 






(c) VSK-MJ model 






H 


a 


a 


A 


V 


AMLE 0.071 


0.077 


0.009 


1.863 


0.012 


(0.0170) 


(0.0129) 


(0.0004) 


(0.3282) 


(0.0015) 


EL 0.072 


0.076 


0.008 


1.862 


0.013 


(0.0143) 


(0.0136) 


(0.0008) 


(0.1569) 


(0.0021) 




(d) Inverse Gaussian OU model 






A 


a 


b 




EL 


0.264 


1.139 


12.558 






(0.0342) 


(0.1364) 


(0.8970) 





out to coincide with arrivals of macrocconomic news (Johannes [25]). The goal of this 
empirical study was to test whether the underlying process is subject to jumps or not. 

The proposed parameter estimates under each of the four univariate models consid- 
ered in the simulation study are reported in Table 5. For comparison, the MLEs or the 
approximate MLEs are also reported except for the Inverse Gaussian OU model. For the 
univariate diffusion models VSK and CIR, and the jump diffusion model VSK-MJ, the 
proposed parameter estimates based on CCF arc very similar to the MLEs or the ap- 
proximate MLEs. The EL estimates of the long-run mean a are 0.059 for VSK and 0.064 
for CIR, both of which are close to the summary statistic of mean rates (0.065). In VSK, 
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Bandwidth 







0.010 


0.012 


0.014 


0.016 


0.018 


Overall 


VSK 


Test Stats 


21.971 


19.225 


16.145 


13.267 


10.786 


14.828 




'0.05 


3.228 


3.123 


2.845 


2.724 


2.647 


1.462 




p-values 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


CIR 


Test Stats 


6.015 


4.775 


3.755 


2.954 


2.335 


3.546 




'0.05 


2.782 


2.739 


2.825 


2.650 


2.448 


1.229 




p-values 


0.0 


0.01 


0.02 


0.026 


0.054 


0.0 


VSK-MJ 


Test Stats 


37.204 


40.901 


45.046 


49.878 


55.561 


25.600 




^0.05 


35.669 


43.548 


52.247 


62.744 


74.298 


28.751 




p-values 


0.046 


0.074 


0.102 


0.126 


0.148 


0.0880 


IG-OU 


Test Stats 


10.716 


9.374 


7.962 


6.663 


5.528 


6.870 




^0.05 


40.463 


47.665 


46.444 


42.396 


41.750 


27.940 




p-values 


0.11 


0.148 


0.124 


0.128 


0.122 


0.162 



the average volatility of 3-month T-bill monthly return (difference) is estimated to be 
ay/5 = 0.018-^1/12 = 0.005, which is also close to the summary statistic of volatility for 
the change (0.005). However the conditional volatility of monthly change in CIR model is 
cry/SXt, and X t has a long-run average 0.064 which is less than 1. Therefore, the process 
needs to have higher a (0.057) to bring up the average volatility of monthly change to the 
same level reflected by the real data. In the jump diffusion model VSK-MJ, our estimate 
of A suggests on average about 2 jumps per year. Relative to VSK and CIR models, the 
estimate for parameter a in the jump diffusion VSK-MJ model is much smaller (0.008), 
indicating that allowing jumps in the process helps to capture large movements in the 
interest rate, and, as a result, the continuous part of the process does not have to be as 
volatile as the one in VSK or CIR models. 

We then applied the proposed test for the validity of each of the four models. The 
bandwidth prescribed by the CV was 0.01. By exploring the kernel estimators of the 
CCF, a reasonable range for h was from 0.01 to 0.018, that offered smoothness from 
slightly under-smoothing to slightly over-smoothing. The bandwidth range used in our 
empirical study consisted of five equally spaced bandwidths ranging from 0.01 to 0.018. 
Table 6 reports p-values of single bandwidth and the overall tests for the four models. 
There is no empirical support for the VSK model. The CIR model performs a little 
bit better as the distances between the test statistics and the critical values decrease, 
but the model is still rejected at significance level of 0.05 in the overall test and almost 
all the single bandwidth tests. We can not reject the jump diffusion model VSK-MJ in 
the overall test and the single bandwidth tests except the one with the smallest band- 
width (p- value =0.046). This constitutes a strong indication of the presence of jumps 
and implies that adding (finite-activity) jumps does help to capture the underlying dy- 
namics of the interest rates. By allowing the infinite-activity jumps in the models, the 
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p-values of the tests for the inverse Gaussian OU model are very supportive, even for 
the small bandwidths, suggesting that the infinite-activity jump model might potentially 
model the dynamics of the 3-month T-bill rates better. A possible reason for this is that 
the jump diffusion model VSK-MJ can only generate small continuous movements from 
Brownian motion and big spikes from the compound Poisson component, but it could 
miss the movements that are between (i.e., the movements with median sizes). However, 
the inverse Gaussian OU process is more flexible since it can generate small, median and 
big movements with infinite arrival rates; therefore it could fill in a gap in the VSK-MJ 
model by capturing movements that are too large for Brownian motion to model but too 
small for the compound Poisson process to capture. 

Appendix 

The following conditions are required in our analysis. 

CI. The stochastic processes given in (1.1) and (3.1) admit unique weak solution 
respectively, which arc a-mixing with mixing coefficient a(t) — Cc~ xt where a(t) = 
sup{\P(A nfl) - P(A)P(B)\: Aen{,Be fi£.J for all s,t > 1, where C is a finite 
positive constant, and fi^ denotes the er-field generated by {X t : i<t< j}. 

C2 (Smoothness). ip t (T;9) =: iP(t;9, X t ) and E{st(r;9)} are third continuous diffcr- 
cntiable with respect to 9 within a neighborhood of 9q which is defined in C3. 7r(-) is 
a bounded probability density supported on a compact set S C R d ; and the diffusion 
function (t(x) is positive definite. 

C3. The parameter space is an open subset of R p , and the true parameter 9q 
is the unique root of E{e t (T;9)} = for all r £ S; and for any 9\ ^ 02, P{il>t('',&i) ^ 
M-^2,X t )}>0. 

C4 (Invertibility) . The Hermitian matrix Var{e t (T; 9q)} is positive definite almost ev- 
erywhere for t <E R 2d with respect to the Lebesgue measure in R 2d ; T(9q) defined in (2.9) 
is invertible. 

C5. The kernel K(-) is a rth order symmetric kernel supported on [— 1,1] and 
has bounded second derivative. We assume d < 4 and the smoothing bandwidth h = 
0{n -1 /( d+2r '}. The bandwidth set {hi, . . . , hk} satisfies hi = Cih for constants c, such 
that c\ < C2 < ■ ■ ■ < Cfc where k is an integer not depending on n. 

C6. {A n (u;X t )} is a sequence of complex functions continuous at u = and 
A„(0;JTi) = 0, sup„ | A n (u; X t )\ < Mi almost surely and the Lebesgue measure of 
{u\A n (u,x) 7^ 0} is positive for all x in the support of the marginal density /, and 
c„ = n~ 1 / 2 hr d / i which is the order of the difference between Hq and H\. 

We need CI as the basic condition for the stochastic processes involved. Ait-Sahalia [1] 
and Genon-Catalot, Jeantheau and Laredo [22] provide conditions on the underlying 
processes such that Assumption CI held. In particular, Ait-Sahalia [1] provides conditions 
so that the observed sequences are /3-mixing, which is automatically a-mixing. We require 
that the rate of decay is exponentially fast to simplify the technical arguments. C2 consists 
of smoothness conditions regarding the CCFs and C3 is for identification of parameters. 
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C4 ensures the covariance matrix is invertible, which is easier to be justified for our 
low-dimensional formulation of estimation and testing approaches. C5 on the kernel and 
bandwidth are standard in nonparametric curve estimation. The assumption of d < 4 
is to make the bias in the kernel estimation a smaller order of h d l 2 so that the bias is 
stochastically negligible relative to £ n h(&o)- The kernel method will encounter the curse 
of dimensionality when d > 4. Also, the commonly used processes in finance and other 
stochastic modeling tend to have dimension less than 4. The bandwidth selected by either 
cross validation or the plug-in method satisfies the order specified in C5. The first part 
of C6 regarding A n (u;X t ) is to qualify %jj t {u]6) under H\ as a bona fide characteristic 
function, whereas the part that requires positive measure on the set {u\A n (u,x) ^ 0} is 
to make H\ a genuine sequence of alternative hypotheses. 

Proof of Lemma 1. By combining results in Kitamura [26] and Chen, Hardle and Li 
[11] for the empirical likelihood of a-mixing processes, we can show that 



A(r; 9) = A- 1 (r; 9) - V e t (r; 9) + o^ 1 / 3 ) = O^ 1 / 3 ) (A.l) 




almost surely and uniformly in ||# — #o|| < n and r T G S. Denote 6 = 9a + un 1 / 3 . It 
follows from (A.l) and Taylor's expansion that, uniformly in ||u|| = 1, 



e n (0) 



/( n n *\ 

2^A T (r; 9)e t {r- 9) - ^{A T (r; e)e t (r; 6)} 2 Ur(r) dr + o(n^) 
{ t=i t=i J 

1 1 ^ o) + ^E ^%^-- 1/3 } AT 1 (r; 6) (A.2) 
i=l t=l ) 

. t=l t=l J 



nl E 



x j£ 

> -cn 1 / 3 
~ 2 



^g/^ ) ^" 1/3 (1 + o(1))}tt(t) dr + o^ 1 / 3 ) 



almost surely, where c > is the smallest eigenvalue of 
sup£ \ n — - \A l {t,t;9 ,9 )E\ 



res 



89 r — a # 



CCF-based estimation and testing 19 
Similarly, 

UOo) = J ef(r; ) j A~\t, t; 9 , o ) j i e t (r; O ) W) dr + o(l) 
= o(nV3), 

almost surely. This, together with (A. 2), implies that £ n (9) has a minimum value in the 
interior of the ball \\9 — 9q\\ < n -1 / 3 , and this value satisfies ■§gi n (9) = 0, that is, the 
second equation in (2.7) by noting (2.4). The first equation follows directly from (2.4). □ 

Proof of Theorem 1. It follows from limit theorems for martingale difference that 



r o_ 

89 



1 n 9 f 3 1 

Q ln (r; 9 ,0) = -J2 4 M " E \ qq^ ^ o) \ > 

t=i ^ J 

1 ™ 

Qi«(t; 0„, 0) = -- X ?t(r; 0o)ef (r; O ) 4 ~M A(t, r; 9 ,9 )M*, 



dXT n S (A.3) 

^Q 2 „(t;0 o ,O) = O, 



^g 2 „(r; flo, 0) = 1 £ ^ £ t(r; flo) A £?{ iU ( T; ^ 
t=i ^ 

uniformly in r T S 5. Put i5„ = \\9 n — Soil +sup T T 6S ||A(T;0 n )||. Then it follows from Tay- 
lor's expansion that 

= Qi„M„,A(t;£„)) 

(A.4) 

= Qin(r;6'o,Oj H — (6>„ - O ) H ^ A(r; 9 n ) + o p {b n ) 

uniformly in r T € 5*, and 

= J Q 2 „(r;0 n ,A(r;0„))7r(r)dr 

n t a n\ , dQ 2 n(T; 6> , 0) - dQ 2n (T; 6 , 0) - 1 
Q2n(T;0o,OJ H ^ (0„ - 6» ) H ^ A(t;6>„) ^7r(r)dr (A. 5) 

+ o p (S„). 
By (A.3)~(A.5), we have 



-r- 1 ^) J s|^£l^^o)}A- 1 (r;0 o ^o)A/ o - 1 if]e t (r;0 o V(r)dT + o p (5„). 



(A.6) 
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Hence the theorem follows from (A. 6) and the central limit theorem for Martingale 
difference. □ 

Proof of Theorem 2. Define V(n, r 2 , x; 6 Q , 9) = E{i(n, X t ; 9)e*(t 2 , X t ; 6)\X t = x} and 
write V(t, x;9q,9) = V(t,t,x;9<j,9). Since 9 n is -y/n-consistent to 6q, we have 

tnhik) = inh,i(0 o ) + nh d R~\K){(9 - 9 Q ) T S n , h {9 ) + S^ th {6 o )0 n - 9 ) 

+ (9 n - 6 Q ) T T n , h (e )(e n - 9 )} (A.7) 
+ O p {{nh d y 1 / 2 log 3 (n) + h 2 \og 2 (n)}, 

where 

Lh,i(0o) = nh d R- 1 (K) J j e k n , h {T,X t ;e Q )V- 1 {T,x;6 0t 6 ) 

x i n>h (r, x; 6 )in (r)/" 1 (x)ir 2 (x) dr dx, 

S n ,h(9o)= I I q -V- 1 (r,x;9 ,e )e nih (T,x;e ) 

x ~ki(t)it2{x) f~ l (x) dr dx, 

r nh (o o) = [ f^^v-Hr^Oo)^^ 

(A.8) 
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x 7ri(r)7r2(.T)/ _1 (a;) drdx. 

As Sn.M^Opin- 1 / 2 ), 

e nh (9n) = W(#o) + O p {(nh d y^ 2 log 3 (n) + h 2 log 2 (n) + h d }. (A.9) 

Note that 

= nh d R-\K) J Jn- 1 ]T K h (x - X tl ){i*(T,X tl ) + c n rj*(T,X tl )} 

tl = l 

n 

xV-^T^-^Mn-^Khix-Xt,) 

(A.10) 

x {e(T,X t2 ) +c„fy„(T,A t2 )} 



X 7Ti (r)7r 2 (a)/ _1 (x) drdx + o p (/i d/2 ) 
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= + H n2 + H n3 + H ni ) + o p {h d f 2 ), 

where, with the choice of c„ = n~ 1 ^ 2 h^ d ^ A , 

H nl = n- 1 h d II K h {x - X tl )K h (x - Xt^t^Xt^V-^T.x) 
x i(T 7 X t2 )iT 1 (T)iT 2 (x)f~ 1 (x)dTdx 7 

H n2 =n- 1 h d Y J j I K&x-X t )?{T,X t )V- l {T,x)e(T t X t ) 
t=i J "* 

x 7ri(r)7T2(a:)/ _1 (a:) dr dx, 

H n3 = 2n 1 / 2 h 3d / 4 I I rtir^V-^r^n-^Khix-Xt^Xt) 
J J t=i 

x 7ri(r)7T2(a:)/ _:L (a;) dr dx, 
h d/2 J J r,* n (r, x^- 1 (r, a:)jj„ (r, x)n x {t)tt 2 (x)r\x) dr dx. 



H, 



We note that ff„2 = 2R(K) + o p (h d ) and the integral in H n3 is p (n x / 2 ). Hence, 
H n3 = O p (n M / A )=o p (h d / 2 ). 

Now consider H n \. Clearly, E(H„i) = and the double summation in H n \ constitutes 
a generalized {/-statistic of order two with the kernel 



& Ilta = / / K h {x - X tl )K h (x - Xt^i^X^V^i^x-eMe^Xt,) 

x 7Ti(T)7r2(x)/ _1 (a;) drdx. 

The [/-statistic is degenerate, due to {e(r, X t2 )} being martingale differences. 

Let (J l=Y,i<t 1 ^t 2 <n CT tiM where a 2 iM = Var(£ tl , t2 ). Then, applying the central limit 
theorem for generalized [/-statistics for a-mixing sequences (Gao and King [21]), we have 

K 1 E 6 llt2 4/v(o,i). (A.ii) 

Furthermore, it can be shown, for instance, by following the route of Chen, Gao and 
Tang [10], that a 2 n = 2n 2 a 2 n{) {\ + o(l)} where a 2 = E tl E t2 t J . Here £ tj denote 
marginal expectation with respect to (X ti ,X ti+ i). 
It can be shown that 



'no 



E tl E t2 { K h { Xl - X tl )K h (xi - X t2 )K h (x 2 - X tl )K h (x 2 - X t2 ) 



X E e il( T l^tl) e fel( T l^t2) £ i2( T 2,^tl) 
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x e k2 {T 2 ,X t2 )v hM (r u x 1 )v l " h > (r 2 , .t 2 ) I 
x 7ri(Ti)7ri(T2)/ _1 (a;i)/ _1 (a;2)7r2(xi)7r2(a;2)dTidT2dxi da; 2 
E tl E t2 ^K h ( Xl - X tl )K h ( Xl - X t2 )K h (x 2 - X tl )K h {x 2 - X t2 ) (A.12) 

2 

X X! V hl2(- T l, T 2,Xt 1 )V kl k 2 (T ll -T 2 ,X t2 ) 
Il,ki,l2,k2 

x iri(T 1 )iri(T 2 )f~ 1 (x 1 )f~ 1 (x 2 )ir 2 (xi)ir 2 (x 2 )dT 1 dT 2 dx 1 da; 2 
= h- d 1 2 (K,V^ 1 ,-K 2 ){l + 0{h 2 )}, 

where 7 2 (if, V, n\, tt 2 ) is defined in (3.4). From (A. 11) and (A.12), we have 

h~ d / 2 H nl A N(0, 2 1 2 (K, V, TTi, tt 2 )). (A.13) 
This, together with the results on H n2 and H n ^, leads to 

h- d/2 (e nh (9) - 2 - n n ) A N(0,2R- 2 (K)-y 2 (K, V,7ri,7r a )), (A.14) 
where fj, n = H n ^. This completes the proof of Theorem 2. □ 

Proof of Theorem 3. The proof can be made by applying the Cramer- Wold device 
and the same technique in the proof of Theorem 2, followed by the mapping theorem. □ 
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Supplementary Material 

Comparisons in estimation and testing with other methods 

(DOI: 10.3150/11-BEJ400SUPP; .pdf). We compared our methods with Carrasco et al. 
[7] for estimation, and with Chen, Gao and Tang [10] for testing. The supplemental article 
(Chen, Peng and Yu [12]) provides additional tables from these comparisons. 
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